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Under the assumption of no-arbitrage, the pricing of American 
and Bermudan options can be casted into optimal stopping problems. 
We propose a new adaptive simulation based algorithm for the nu- 
merical solution of optimal stopping problems in discrete time. Our 
approach is to recursively compute the so-called continuation val- 
ues. They are defined as regression functions of the cash flow, which 
would occur over a series of subsequent time periods, if the approx- 
imated optimal exercise strategy is applied. We use nonparametric 
least squares regression estimates to approximate the continuation 
values from a set of sample paths which we simulate from the under- 
lying stochastic process. The parameters of the regression estimates 
and the regression problems are chosen in a data-dependent manner. 
We present results concerning the consistency and rate of convergence 
of the new algorithm. Finally, we illustrate its performance by pric- 
ing high-dimensional Bermudan basket options with strangle-spread 
payoff based on the average of the underlying assets. 

1. Introduction. Many financial contracts allow for early exercise before 
expiry. Most of the exchange traded option contracts are of the American 
type which allows the holder to choose any exercise date before expiry, or 
the Bermudan with exercise dates restricted to a predefined discrete set of 
dates. Mortgages have embedded prepayment options such that the mort- 
gage can be amortized or repayed. Also, life insurance contracts may allow 
for early surrender. In this paper we are interested in pricing options with 
early exercise features. It is well known that in complete and arbitrage free 
markets the price of a derivative security can be represented as an expected 
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value with respect to the so-called martingale measure; see, for instance, 
[16] . Furthermore, the price of an American option with maturity T is given 
by the value of the optimal stopping problem 

(1-1) V = SUp E{d , T / r (A r )}, 

re7 [0,T] 

where ft is a nonnegative payoff function, Xt is a stochastic process, which 
models the relevant risk factors, 7[o,T] is the class of all stopping times with 
values in [0, T], and d Sj t are nonnegative J r ((X u ) s < tt < f )-measurable discount 
factors satisfying do,t = ^o,s ■ ds,t f° r s < t. In practice, the process X t is often 
a geometric Brownian motion, as, for instance, in the celebrated Black- 
Scholes setting. A more general class of models is obtained with diffusions, 
jump-diffusion processes or nonparametric time series models. The model 
parameters are usually calibrated to observed time series data. 

The first step in addressing the numerical solution of (1.1) is to pass 
from continuous time to discrete time, which means in financial terms to 
approximate the American option by a Bermudan option. The convergence 
of the discrete time approximations to the continuous time optimal stopping 
problem is considered in [18] for the Markovian case but also in the abstract 
setting of general stochastic processes. 

For simplicity, we restrict ourselves directly to a discrete time scale and 
consider exclusively Bermudan options. In analogy to (1.1), the price of a 
Bermudan option is the value of the discrete time optimal stopping problem 

(1.2) V = sup E{d 0tT f T (X T )}, 

tGT(0,...,T) 

where Xq, X\,..., Xt is now a discrete time stochastic process, and T(0, . . . ,T) 
is the class of all {0, . . . ,T}-stopping times. For additional theoretical back- 
ground on valuating Bermudan options, we refer to [25]. 

In the sequel we assume that X$,X\, . . . , Xt is a [—A, A] d -va\ued Markov 
process recording all necessary information about financial variables includ- 
ing prices of the underlying assets as well as additional risk factors driving 
stochastic volatility or stochastic interest rates. We also assume that the 
law of Xq, . . . ,Xt is known such that we can draw random sample paths 
as well as partial sample paths Xt, ■ ■ ■ ,Xt for arbitrary starting values of 
Xt- Neither the Markov property nor the form of the payoff as a function of 
the state of Xt is restrictive and can always be achieved by including sup- 
plementary variables. For instance, in the case of an Asian option we add 
the running mean as an additional variable into Xt. Because the diffusion, 
jump-diffusion or time series models, which appear in practical applications, 
lead to unbounded stochastic processes for the underlying state variables X t , 
they must be suitably localized to a bounded set [— A, A] d . 

The boundedness assumption Xt £ [— A, A] d then allows us to estimate 
the price of the Bermudan option from samples of polynomial size in the 
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number of free parameters. This is in contrast to Glasserman and Yu [13]. 
Their work does not impose a boundedness assumption on the underlying 
process and shows that for arithmetic and geometric Brownian motions, the 
sample size must grow exponentially in the number of free parameters in 
order to retain a convergent estimator. 

The computation of (1.2) requires the determination of an optimal stop- 
ping rule t* G T(0, . . . , T) which satisfies 

(1.3) V = E{do,T*fT*(X T *)}. 
Let 

(1.4) q t (x) = sup E{d tiT f T (X T )\X t = x} 

reT(t+l,...,T) 

be the so-called continuation value describing the value of the option at time 
t given Xt = x and subject to the constraint of holding the option at time t 
rather than exercising it. The general theory of optimal stopping for Markov 
processes (see, e.g., [5, 11, 22, 26]) implies that 

r* = M{s>l:q s {X s ) < f s (X s )} 

is an optimal stopping time, that is, r* satisfies (1.3). Therefore, computing 
the continuation values (1.4) solves the optimal stopping problem (1.2). 

Explicit solutions of (1.2) do not exist, except in very rare cases, but there 
are a variety of numerical procedures to solve optimal stopping problems, 
each with its strength and weaknesses. In this paper we study a concrete 
simulation algorithm. The first attempts to use simulation are [2, 3, 28]. 
Longstaff and Schwartz [21] introduce a new algorithm for Bermudan op- 
tions in discrete time. It combines Monte Carlo simulation with multivariate 
function approximation. Tsitsiklis and Van Roy [29] independently propose 
an alternative parametric approximation algorithm using stochastic approx- 
imation to derive the weights of the approximation. Both algorithms ap- 
proximate the value function or the early exercise rule and therefore provide 
a lower bound for the true optimal stopping value. Upper bounds based on 
the dual problem are derived in [15, 23]. More details and further references 
can be found in [4] and [12]. The article [19] compares several Monte Carlo 
approaches empirically. 

In this paper we enhance the approach of [21] and its generalization pre- 
sented in [10]. We construct estimates qt of qt and approximate the optimal 
stopping rule r* by 

(1.5) t = M{s>l:q s (Xs)<fs(X s )}. 
Then, a Monte Carlo estimate of 



(1.6) 



4 



D. EGLOFF, M. KOHLER AND N. TODOROVIC 



provides a lower bound for the price Vo of the Bermudan option. 

To this end, we represent qt as a regression function of a distribution 
(X t ,Y t ), where Y t depends on the partial sample path X t+ i, . . . , X t+w+ i and 
qt+i, ■ • ■ , qt+w+i for some tunable parameter w € {0, 1, . . . , T — t — 1}. This 
distribution will in turn be approximated by (X t ,Y t ), where Y t depends on 
Xt+i , . . . , Xt+w+l and qt+l > • • • , Qt+w+l ■ We construct an estimate qt of qt 
with nonparametric regression techniques applied to a Monte Carlo sample 
of the distribution {X t ,Y t ) and use this estimate together with qt+i, • • • , qt+w 
to compute recursive estimates of qt-i, ■ ■ ■ ,qo- Our algorithm is adaptive in 
the sense that all parameters of the estimates and the parameter w of the 
distribution of (Xt,Yt) are chosen in a data dependent manner. 

We proceed as follows. In Section 2 we describe in detail the connection 
between discrete time optimal stopping problems and recursive regression. 
The dynamic look-ahead Monte Carlo algorithm for solving optimal stopping 
problems is introduced in Section 3. The main theoretical results, including 
the consistency and the rate of convergence of the algorithm, are presented 
in Section 4. The finite sample properties of the proposed algorithm are 
illustrated in Section 5 with a simulation study. Section 6 contains the proofs. 

2. Discrete time optimal stopping and recursive regression. Let X = 

(Xt)t=o,...,T be a discrete time Markov process with values in R. d , fi t the law 
induced by Xt on Mr, and F = (Ft) be the induced filtration where 

(2.1) r t =F(X ,...,Xt) = y<T(X 8 ) 

s<t 

is the sigma algebra generated by the random variables {X s |s < t}. The so- 
lution of the discrete time optimal stopping problem for nonnegative reward 
or payoff functions ft is given by the value function 

(2.2) v t {x)= sup E[f T (X T )\X t = x}. 

reT(t,...,T) 

The supremum runs over the class T(t, . . . ,T) of all F-stopping times with 
values in {t, . . . ,T}. By definition, each r £ T(t, . . . , T) satisfies {r = k] G 
T{Xq, . . . , Xk) for k E {t, . . . , T}. Here and in the sequel we assume for no- 
tational simplicity that ft contains already the discount factor occurring in 

(1.2) . Once the value function has been determined, the smallest optimal 
stopping time as of time t can be derived as 

(2.3) r t * = inf{s > t\v s (X s ) < f s (X s )}. 

The optimal stopping problem can also be characterized in terms of the 
so-called continuation value, which is given by 

(2.4) q t (x)= sup B[f T (X T )\X t = x] = E[f Tl ■ (X T . ■)\X t = x] 
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for t < T — 1 and set to qT = at maturity T. The value function and the 
continuation value are related by 

(2.5) v t (X t )=ma X (f t (X t ),q t (X t )), q t (X t ) =B[v t+1 (X t+1 )\X t ]. 

From now on we primarily consider qt. The continuation value satisfies the 
dynamic programming equations 

q T (x) = 0, 

(2.6) 

Qt(x) =E [max(/ m (X t+1 ),qt+i (X t +i) )\X t = x\. 
The recursion for the optimal stopping rules is given by 
r} = T, 

T t = tl {qt(X t )<f t (X t )} + T t+l l {qt(X t )>f t (X t )}- 

The dynamic programming equations (2.6) show that the optimal stopping 
problem in discrete time is essentially equivalent to a series of regression 
problems. Equation (2.4) provides a different regression representation of the 
continuation value, once the optimal stopping rule of the next future period 
is known. These representations are extreme cases, as we will explain in the 
following. For h t £ M/- 4 *) with h T = fr, we define on = x w+ iM d the 

function 



&t-.w(f, h t ,.-., h t+w ){x t , ...,x t+ 



w ) 



t+w s—1 

( 2 - 8 ) = H fs( X s) 1 {fs(Xs)-h a (Xs)>0} II l {fr(Xr)-h r (Xr)<0} 

s=t r=t 
t+w 

+ h t+w (x t +w) \\ l{f r (Xr)-h r {Xr)<0}l 

r=t 

where we follow the convention that the product over an empty index set 
is equal to one. In the following, to reduce notational overhead, we simply 
write 

(2.9) tit:w(f,h)=# t:w (f,ht,...,ht +w ), 

thereby implicitly assuming that $p.w{f, h) is solely depending on ht, ... , h t + w . 

In a financial context the function fip.wifih) has a natural interpretation 
as the future payoff we would get by holding the Bermudan option for at 
most w periods, applying the stopping rule Tt(h) A (t + w) which is defined 
recursively by 

r T (h) = T, 

(2.10) 

T t (h) - tl{f t ( Xt )-ht(x t )>o} + T t+i(h)l{f t (x t )-ht(x t )<o}, 
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and selling the option at time t + w for the price ht+ w (X t+W ) , if it is not 
exercised before. 

We now come back to the generalization of the regression representations 
(2.4) and (2.6). First note that max(/ (+ i, qt+i) =$t+i:0 (/><?) and, therefore, 

(2.11) q t (x) = -E[#t+Mq)(X t +i)\X t = x]. 

On the other hand, the recursive formula (2.7) for the optimal stopping rule 
r t * shows that 

fr t * +1 ( X r* +1 ) = frt +1 (q)( X Tt +1 (q)) = ^t+l:T-t-l(f,Q)(X t+1 ,. . .,X T ), 

such that we also have [cf. (2.4)] 

(2.12) q t (x) = E[tf t+1:T _ t _ 1 (/,g)(X t+1 ,. . .,X T )\X t = x}. 
More generally, we have for any < w <T — t — 1 the representation 

(2.13) q t (x) = E[# t+llw (f, q)(X t+1 , . . . ,X t+w+1 )\X t = x}. 
To prove (2.13), we start with 

q t (X t ) = B[max(f t+1 (X t+1 ),q t+1 (X t+1 ))\X t ] 

(2.14) = E[ft+i(X t+ i)l {ft+1 (x t+1 )-h t+ i(x t+1 )>o} 

+ qt+i{X t+ i)l{f t+1 ( Xt+1 )-h t+1 (x t+1 )<o}\Ft], 

where we have used the Markov property in the second equality. Then we 
expand q t+1 (X t+1 ) in (2.14) by 

E[ft+2(Xt + 2)l{f t+2 (X t+2 )-h t+2 (X t+2 )>0} 

+ qt+2{X t+ 2)^{f t+2 (X t+2 )-h t+2 (X t+2 )<Q}\^t+l] 

and proceed recursively up to t + w + 1. Equation (2.13) follows from the 
projection property E[E[-|^ + i]| Ft] = E[-|.F t ] of conditional expectations and 
by another application of the Markov property. 

3. Monte Carlo algorithms for optimal stopping. Equation (2.13) shows 
that the continuation value qt at time t can be obtained as the regression 
function of "&t+v.w{f ■> q) f° r some < w <T — t — 1. Least squares Monte Carlo 
methods pioneered by [21], and extended in [10] to arbitrary w, recursively 
estimate the regression function qt from independent sample paths of the 
underlying Markov process Xt- Let 

(3.1) Xt+l:w = (Xf+1, ■ . ■ ,Xt-\-w+l) 

be the partial sample path of length w starting at t + 1 . When it comes to 
estimation of the continuation value qt, these algorithms use the previously 
determined estimates qt+i, ■ ■ ■ ,qt+ w +i for qt+i, ■ ■ ■ ,qt+w+i to construct 

(3.2) Y t = $t+l:w(f, q)(Xt+l:w) = $t+l:w(f, <7t+l) • • • j <lt+w+l) {Xt+l;w ) , 
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which takes the role of the dependent variable of the regression problem for 
time step t. The random variable Yt is an estimate of the unknown optimal 
reward 

(3.3) Y t = $t+l:w(f,<l)(Xt+l:w) = $t+l:iu(/)9t+lj Qt+w+l) (Xt+l-.w)- 

Given independent sample paths 

(3.4) Xj = (Xj it ) t=0i ...,T, i = 1, ■ • ■ , n, 

of the underlying Markov process X, the least squares estimate of q± is 
obtained as 

1 n 

(3.5) q n ,t = argmin-V \h(X itt ) - Y i:t \ 2 , 

h€H n , t n i= i 

where 

(3.6) Yi yt = "dt+l:w{f i Q.)(^i,t+l:w), X^t+l-.w = {^i,t+l, • • • )Xi,t+w+l) 

and 7i n ,t is a set of functions h : M. d — > R. 

With w = 0, the above algorithm corresponds to the Tsitsiklis-Van Roy 
algorithm [29], while w = T — t — 1 has been proposed in [21]. The idea of 
using an intermediate value w G {0, 1, . . . ,T — t — 1} in order to "interpo- 
late" between these two algorithms has been introduced in [10]. A further 
contribution of [10] is the consistency and the rate of convergence of the 
above algorithm for fixed w and fixed convex and uniformly bounded func- 
tion spaces 7~{ n .t, without imposing any distributional assumptions on the 
underlying process X t . 

The boundedness assumption on H ni t makes the computation of the least 
squares estimate in (3.5) difficult because it leads to constrained optimiza- 
tion problems; see, for instance, [14], Section 10.1. In addition, the convexity 
assumption excludes promising choices like spaces of polynomial splines with 
free knots or spaces of artificial neural networks, which require restrictions on 
the number of knots or the number of hidden neurons, respectively, to con- 
trol the "complexity" of the function spaces. The resulting function spaces 
violate the convexity assumptions. Taking the convex hull instead is not an 
option because it would lead to function classes with a complexity that is 
much too high. Furthermore, in view of applications, it is desirable to choose 
parameters of the functions spaces and also the parameter w of the underly- 
ing regression problems data dependent. In this paper we modify the above 
algorithm such that this is possible. For simplicity, we restrict ourselves to 
function spaces, which are linear vector spaces, however, it is straightforward 
to derive similar results for spaces of polynomial splines with free knots or 
spaces of artificial neural networks. 

The main problem in analyzing the estimates q n t is the control of the er- 
ror propagation, that is, to answer the question how the errors of q Ui t+i, . . . , 
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q n: t+w+i influence the error of q n ^. At this stage Egloff [10] uses the convex- 
ity of H. n ,t to bound the L2-error in terms of the approximation error and a 
sample error derived from a suitably centered loss function. The difficulty for 
obtaining error estimates comes from the fact that qt+i, • • • , qt+w+i depend 
on a single set of sample paths (3.4) and are thus dependent. Clement, Lam- 
berton and Protter [6] face the same difficulty while deriving a central limit 
theorem for the Longstaff-Schwartz algorithm with linear approximation. 

In the sequel we use a trick to simplify the analysis of the error prop- 
agation. Instead of using the partial sample path X t +i :w of our training 
data again, which we used in part already in the construction of the esti- 
mates q n ,t+i, ■ ■ ■ An,t+w+ii we generate new data for X t +i- w which 
are conditionally independent from all previously used data of time s > t 
given Xt at time point t. We then construct samples of the distribution of 
(X t ,Y t W ' nCW ), where 

t — Vt+l:w\J, <M,t+l> ■■■■> Qn,t+w+l)\-X- t+1:w ). 

Since for Xt given, the random variable X^^ is independent of all previ- 
ously used data for all time points s > t, it is, in particular, independent of 
the data used in the construction of q n ,t+i, ■ ■ ■ > <ln,t+w+i- Set 

Qt 0)=E {Y t \X t = x}, 

where in E*{-|Xt = x} we take the conditional expectation with respect to 
fixed Xt = x and with all the data fixed which were used in the construction 
of q n ,t+i, ■ ■ -,q n ,t+w+i- Proposition 6.4 in [10] implies 

\qf' new (x) - q t (x)\ nt(dx) 

(3.7) 



t+w+l , . .1/2 

- \ / \Qn,s{x) - q s (x)\ 2 n s (dx) I . 




s=t+l 

This allows us to control the error propagation. By induction, assume that 
we have 

q n , s (x) - q s {x)\ 2 Hs{dx) 

(3.8) > c • ^5 n ,r + ^ n ^ n J \h(x) — q r (x)\ 2 /j. r (dx)j^j 

— > (n — > oo) 

for s G {t + 1, . . . , t + w + 1}. Assume, in addition, that we are able to show 
\q n ,t{x) -q™' new {x)\ 2 nt{dx) 
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(3.9) > c ■ (s n , t + min / \h(x) - q?' new (x)\ 2 fM(dx)) ) 

— ► (n— >oo), 

which is for suitable 5 n j (depending on the "complexity" of the function 
spaces H n ,t) a standard rate of convergence result for least squares estimates 
from a sample of size n, where in the sample the response variables are 
independent given the predictor variables and where the predictor variables 
are independent; see [30] or [17]. 

It can be shown that (3.7)-(3.9) imply 




\h(x) - q s (x)\ n s (dx 



(n — too). 



Details concerning related arguments can be found in the proofs of Theorems 
4.1 and 4.4 below. 

The main difference between our work here and the algorithms used in 
[21] and [10] is that we generate new data to construct samples of y™ ,ncw . 
Therefore, the data used for estimation of q™> new is conditionally indepen- 
dent given the sample of Xt, which enables us to conclude (3.9) from stan- 
dard rate-of-convergence results in nonparametric regression. The generation 
of the new, independent data is similar to the data generation in the random 
tree method (see, e.g., Section 8.3 in [12]). However, in contrast to the ran- 
dom tree method, we use nonparametric regression techniques to estimate 
the regression function, while in the random tree method simple averages are 
used to estimate the regression function point by point. As a consequence, 
the number of data points for the random tree method grows exponentially 
in T, while for our method it grows only linearly in T. 

In the sequel we explain the definition of the estimates in detail. Let n 
be the number of samples which we generate for our regression estimates, 
and let u> max £ {0, 1, ... ,T — 1} be the maximal look-ahead which we use. 
We start with generating n independent sample paths 

Xj = (X i: t)t=o,...,T (i = l,...,n) 
of the underlying Markov process X. Then we set 



QT = Qn,T = 
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and construct successively estimates of qx-i, ■ ■ ■ ,qo as follows: Fix t £ {0, 1, ... , 
T — 1} and assume that estimates q n ,t+i, ■ ■ ■ , <Zn,T-i of qt+i, • • • , <?t-i are al- 
ready constructed. Let 

^max(i) = min{w max , T - t - 1} 

be the maximal look-ahead of time period t. Generate independent sample 
paths 

A i,t:«) max (t)+1 ~ V A M )s=t,...,t+w max {t)+l {I - L,...,U) 

starting at -X"*'" ew = X^t for every i G {l,...,n} such that, for all i, the 
partial sample paths 

(3.10) X*f w m , 1 

have the same distribution as ^i,t:w max (t)+i > an d such that, given Xij, ■ ■ ■ , X n j, 
this data is independent of all previously generated data points for all time 
points s > t. Define 

V^w,ncw q / r ~ ~ \ / yt,new yt,new \ 

Y i,t — Vt+l:w\J ,qn,t+l, Qn,t+w+l){^i it +l , ■ ■ ■ ) A j,t+ TO +lJ 

for every «j£{0,... ,w max (t)} and apply a nonparametric least squares es- 
timate to the data 

(3-11) ((Xi,t,Y™r W ))i=i,...,n 

to construct estimates q™ t of qt- The final step is to choose 

w t G {0,l,...,'u; max (i)}. 

The resulting estimator for qt is then given by 



(3.12) = q 

Next, we explain in detail how to define the nonparametric least squares 
estimates applied to the data (3.11) and how to select wt in a data dependent 
way. To this end, we split the sample in three parts: a learning sample of 
size ni, a testing sample of size nt and a validation sample of size n v , where 
n = n\ + nt + n v Furthermore, we assume that we are given a finite set V n 
of parameters and for each p£P n , a set TL niP of functions h : M rf — > M. 

For u; fixed, we first define q% t . For every pGV n , let 

(3.13) <f (•) = argmin ±- £ |fc(^ t ) - ^' new | 2 

be the least squares estimate of qr™' new i n 7i n)P , which we take as an esti- 
mate of qt- In (3.13) we assume for notational simplicity that the minimum 
exists, however, we do not require that it is unique. If the minimum is not 
uniquely defined, we can choose as estimate any functions which achieves 
the minimum and for this function the theoretical results in Section 4 will 
hold. 
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Remark 3.1. It is enough that q%f is almost minimizer in the sense 
that 

^ 2^\ln,t\ X i,t) ~ Y i,t I 



(3.14) 



n < i=i 



< min ±J2\ h ( X ^)-^T CW \ 2 + o(n- 1 ). 



i=l 

The result follows from the proofs. 
Let 

(3.15) T^z = max{ — L, min{L, z}}, zgl, 

denote the truncation operator at threshold level L > 0. For a suitable 
threshold parameter (3 n > 0, to be determined later, we set 

(3.16) Cf{x)=T Pn q w n f{x) (xGR d ), 

such that q™f is bounded in absolute value by (3 n . Next, we apply the method 
of splitting the sample to select the parameter p; see, for instance, Chapter 
7 in [14]. We set 

(3.17) q™ t (x)=Cf(*) (x€R d ), 
where pf G V n satisfies 

i rii+rtt 

— 2^ \in,t { x i,t) - y itt I 

nt i= nl +l 

(3.18) 

p£V n nt . . . 
i=m+i 

Finally, we explain our choice of w. For each w G {0, 1, . . . , u> max (f)}, defini- 
tion (3.17) provides an estimate q™t of q^. The idea is to compute from q™t 
an approximately optimal stopping rule which gives a lower bound on the 
solution of the optimal stopping problem at time t. The optimal candidate 
for w is the one that maximizes the lower bound. We therefore set 

1 n 

(3.19) w t = argmax — V f fW(x t,new ■, (A^' n °™ t ), 

where for w G {0, 1, . . . , w majX (t)} the approximately optimal stopping rule 
rf is defined by 

(3-20) rf = T t (q™ t , q n ,t+u <?n,T-2, Qt-i), 

with Tt(/i) recursively defined as in (2.10). The specification (3.19) for vbt 
completes the definition of the estimator (3.12). 
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Remark 3.2. The quality and the computational cost of the estimator 
primarily depends on the size of ni, which is used in (3.13) to perform the 
key nonparametric regression. On the other hand, the magnitude of nt and 
n v is less critical because they are only used to select optimal parameter 
values from a relatively small discrete set and the corresponding objective 
functions converge, according to Hoeffding's and Bernstein's inequality, very 
fast. The impact of nt and n v on the overall computation cost is also minor. 
In practical applications, ni should be increased as large as affordable by 
the available computation capacity. 

Remark 3.3. Note that the optimization in (3.18) and (3.19) is per- 
formed over a finite set, which implies the existence of an optimizer. 

4. Main theoretical results. If the stochastic process of the underlying 
state variables Xt is unbounded, we first localize it to a bounded set [—A, A] d . 
For many industry models, the localization error can be estimated explicitly. 
For illustration, we consider a discretely sampled jump-diffusion process X t . 
Let 

Gf(t,x) = (±tr(AV 2 /) + (b,Vf))(t,x) 



(4.1) +/ (/( x + u )_/( x ) 

JR d \{0} 

~ l {\\u\\<i}(u,Vf(t,x)))S(t,x,du) 

be the generator of the corresponding continuous time process X®, where we 
assume that A, b are Borel measurable, A is positive definite, with norms 
||A|| < ao, ||b|| < bo, and 5 is a positive kernel on M rf \{0}, Borel measurable 
in x such that 

(4.2) swpS(t, IM| 2 l{|| u ||<i} + |M|l { | H | >1} ,du) < co. 

X 

Define 

(4.3) mt = sup ||-X"s — x\\. 

0<s<t 

Then, Lemma 17 of [20] states that, for every AGl and positive A, rj, there 
exists a constant k only depending on ao anci c o such that 

(4.4) P(m t >A)<2dexp(-^-(A- \\x\\ -b t -??) + — kt(l + e^)) + —. 

V a 2 J rj 

To localize the process X® to a bounded set [— A,^4] d , we replace X t ° with 
the process X^ A killed at first exit from [— ^4, The semi-group of the 
killed process is 

(4.5) P^ A f{x) = E x {f(X? A )} = E x {f(X?Wt}, 
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where Mt is the multiplicative functional Mf = l{ t<TA } for ta = inf{s > 
^ [~A^4] d }; see, for instance, [1]. We obtain 

sup \E{T L f(X° T )}-E{T L f(X°' A )}\ 

(4.6) 

< sup E{T L f(X° T )l {mT>A} }<LP(m T >A), 

which, because of (4.4), can be made arbitrarily small by first choosing rj 
and then A large enough. Proposition 5.2 in [10] estimates the error if the 
payoff ft is replaced by the truncated payoff Tif t . We arrive at an a priori 
bound for the localization and payoff truncation error. 

In the following we derive the consistency of our estimator (3.12) under 
the assumption 

(4.7) X t £[-A,A\ d a.s. (t G {0, 1, . . . ,T}). 

In addition, we assume that the payoff f s is bounded on [— A, A] d by some 
constant L > such that 

(4.8) |/ S (2;)|<L for xe [-A,A] d and s E {0, 1, . . . , T}. 

Observe that (4.8) implies \q t (x)\ < L for x E [-A,A] d and t E {0, 1, . . . ,T}, 
so that (3 n = L can serve as the truncation parameter for the estimator. 

In the sequel we use polynomial splines to define the function spaces 
T~(-n,p = W p independent of the sample size n and parameterized by p = 
(M,a) E No x (0, oo). We note that our results can be extended to other 
function spaces in a straightforward manner. 

For p = (M,a) and k E Z, we set u^ = k ■ a. Let yi :R — > ]R be the 
univariate B-spline of degree M with knot sequence (ui)i e z and support 
supp(Bfc,j\/) = [uk,Uk+M+i]- In the case of M = the B-spline 5^,0 is the 
indicator function of the interval [uk, Uk+i)- If M = 1, we obtain the so-called 
hat-functions 



x-u k 
u k+ i - u k 



for u k <x< u k+1 , 



B k Ax) = { u k+2 -x 

, for u k+ i < x < u k+2 , 

Uk+2 — Uk+1 

1 0, else. 

The general definition of B k ^j can be found, for example, in [8] or in Sec- 
tion 14.1 of [14]. The B-splines B kj M are basis functions which are piecewise 
univariate polynomials of degree M. They are globally (M — l)-times con- 
tinuously differ entiable, and the Mth derivative can only jump at the knots 
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For every multi-index k = (ki, . . . , k^) G we define the tensor product 
B-spline B k)AI : R d -> R by 

B k)M (* (1) , ■ • • = B kl>M (xW) ■ ■ ■ B kdM (x^) (x«, . . . )S W G R). 
Let 

W n>p = < ak • -Bk,A/ : a k G R > 

lkeZ d :supp(_B kjM )n[- J 4,A] d ^0 J 

be the span of tensor product B-splines -E>k,M) such that supp(.Bk,M) has 
a nonempty intersection with [— ^4,A] rf . The spanning functions B k nj are 
(M — l)-times continuously differentiable, piecewise multivariate polynomial 
of degree less than or equal to M, defined on rectangular domains 

(4.9) [u kl ,u kl +i) X ■•■ X [u kd ,u kd +i) (k=(k 1 ,...,k d )£Z d ), 

and vanish on all of the rectangles (4.9) for which there exists j G {1, . . . , d} 
such that either 

kj > and u kj -M > A 

or 

kj < and u kj+M +i < -A. 

Consequently, TC ntP is a linear space of functions consisting of piecewise 
polynomials with respect to equidistant partitions of R rf into cubes of edge 
length q, vanishing outside a compact set. 
For a sample size n, we use the parameters 

V n = {(M,a):M £N ,M < [log(n)] , a = 2 fe for some k£Z,\k\ < \log(n)]}. 

Here log denotes the natural logarithm, and for z G R, we denote by \z] the 
smallest integer greater than or equal to z. 

Let q n< t be defined as in Section 3 with V n and 7i njP as above. Note 
that 7i n ,p is a linear function space which implies that the minimum in 
(3.13) always exists. According to Remark 3.2, the computational cost of 
the estimator is not adversely affected by large values for m and n v of 
roughly the size of n/. Therefore, we choose for simplicity n v = nt = [n/3\ 
and ni = n — n v — nt- Our first result concerns consistency of the estimator. 

Theorem 4.1. Assume (4-7) and (4-8), and let the estimate q U) t be 
defined as above with [3 n = L. Then 

E J \q n ,t{x) - q t (x)\ 2 fit(dx) ^0 (n-> oo) 

for allte{0,l,...,T-l}. 
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Remark 4.2. Because convergence in L\ implies convergence in prob- 
ability, Theorem 4.1 proves, in particular, that J \q n ,t — qt\ 2 ^t(dx) — > in 
probability as n — > oo. 



Next we study the rate of convergence. It is well known in nonparametric 
regression that without smoothness assumptions on the regression function 
the rate of convergence can be arbitrarily slow (cf., e.g., [7, 9] or [14], Chapter 
3). We assume that the continuation values qt are (p, C)-smooth according 
to the following definition. 



Definition 4.3. 
A function / : R d -» 



of total order u\ + 
Of 



Let p = k + (3 for some k S No, 6 (0, 1], and let C > 0. 
i is called (p, C)-smooth, if all partial derivatives 

df 



Qai x (l) . . . Qa dx {d) 

+ ad = k exist and satisfy 
df 



Qai x (l) . ..Qa dx {d) 

for all x.z effi d . 



(x) 



,9ai x (l) . ..Qa dx {d) 



<C - llx 



Such a smoothness assumption is not unreasonable. For a sufficiently reg- 
ular diffusion or jump-diffusion process, the semi-group of Markov transition 
operators P s t (g)(x) = E[g(X t )\X s = x] is strongly smoothing already for ar- 
bitrarily small time steps. In particular, we can expect that the continuation 
value q t = P t ,t+i( m ax((/ t+ i, qt+i)) is (p, C)-smooth under suitable assump- 
tions on Xt and the payoff ft- At this point, it also becomes clear why it is 
unfavorable to directly work with the value function Vt which does not retain 
the smoothness because the maximum operation is applied after the tran- 
sition operator. Next, we address the rate of convergence of the estimator. 



Theorem 4.4. Let p = k + (5 for some k £ N , (3 £ (0, 1], and let C > 0. 
Assume fe<M max , (4-7), (4-8) and 

qt (p,C) -smooth 

for all t £ {0, 1, . . . ,T — 1}. Let the estimate q n ^t be defined as above with 
f3 n = L. Then for every t € {0, 1, . . . , T — 1}, 



E J \q n ,t(x) - q t (x)\ 2 m(dx) < const • C 2d '^ +d ^ ■ ( 



logn\ 2 P/(2p+d) 

n 



■ 
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Remark 4.5. We would like to stress that in Theorems 4.1 and 4.4 
there is no assumption on the distribution of X besides the assumption 
(4.7). In particular, it is not required that Xt has a density with respect to 
the Lebesgue-Borel measure. 

Remark 4.6. It is well known that the optimal rate of convergence for 
the estimation of (p, C)-smooth functions is n~ 2p ^ 2p+d ^ (see, e.g., [27] or 
[14], Chapter 3). Hence, the rate of convergence in Theorem 4.4 is optimal 
up to a logarithmic factor. 

Remark 4.7. The definition of the estimator in Theorem 4.4 does not 
depend on the degree of smoothness of qt represented by (p,C). Neverthe- 
less, the estimator achieves the optimal rate of convergence for a particular 
smoothness of the continuation value. In this sense the estimator is able to 
adapt automatically to the smoothness of the continuation value, in contrast 
to the estimates in [10]. 

Remark 4.8. Assume Xq = xq a.s. for some xq G [— A, A] d . We can es- 
timate the price 

V = v (x ) = ™-&x{fo(x ),q (x )} 
[cf. (1.2), (2.2) and (2.5)] of the Bermudan option by 



Since the distribution of Xq is concentrated at xq, Theorem 4.4 leads to the 
error bound 



E{|Vb - Vb| 2 } = E{| max{/ (x ),g„,o(xo)} - max{/ (x ), q (x )}\ 2 } 
< E{\q nfi (x ) - q {x )\ 2 } 



5. Finite sample behavior. In this section we illustrate the finite sample 
behavior of our algorithm (EKT) in comparison to the Longstaff-Schwartz 
(LS) and Tsitsiklis-Van Roy (TR) algorithm. To compare the three algo- 
rithms, we proceed as follows. We independently generate sample paths and 
compute for each algorithm the Monte Carlo estimates (MCE) of the price 
(1.6). Because all three algorithms provide a lower bound for the optimal 
stopping value, and since we evaluate the approximative optimal stopping 
rule on independent sets of sample path, a higher MCE indicates a better 
performance. 



V =m&n{f (x ),q nfi (x )}. 
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The underlying model for the dynamics of the stocks is a simple geometric 
Brownian motion. We apply a Euler scheme to discretize the time interval 
[0, 1] into m time steps. Consequently, the prices of the underlying stocks 
on the time grid 0, — , . . . , 2*^ ; i are given by 

(5.1) 

(i = l,...,n,j = l,...,m). 

Here, Xq is the initial stock price at time 0, r is the risk-free interest rate, 
a the instantaneous volatility, and 

i=i 

is the sum of independent standard normally distributed random variables 
Zi t i(i = 1, . . . , n, I = 1, . . . ,m). All option contracts are based on a time to 
maturity of 1 year and a risk-free continuously compounded interest rate 
r = 0.05. 

Figures 1 and 3 report the results for 100 independent MCE of ordinary 
Bermudan put option and for a more complicated Bermudan option with a 
strangle spread payoff. Each algorithm is based on a sample size n = 10000. 
For (LS) and (TR), we use polynomials of degree 3. For (EKT), we set the 
number of learning, training and validation samples to n\ = 6000, rit = 2000 
and n v = 2000, and choose the degree M, the knot distance a and the look- 
ahead parameter w(t) in a data-dependent manner as described in Section 
3 from the sets Me {0,1,2}, a G {^,^,^,^}, and w(t) £ {0,4, T- 
t-1}. 

We first analyze the results in Figure 1 for a Bermudan put with exercise 
price 90 on an underlying with instantaneous volatility a = 0.25. The time 
discretization is performed in monthly steps. Our algorithm is slightly better 
than (LS) and comparable to (TR). This is not surprising, since it is well 
known that for simple payoff functions both (LS) and (TR) perform rather 
very well. 

Figure 3 consolidates the simulation results of a Bermudan option with 
strangle spread payoff with 50, 90, 110 and 150, as illustrated in Figure 2. 
The volatility is increased to a = 0.5, the time discretization is set to m = 48. 
This time (EKT) provides a higher MCE of the option price and therefore 
clearly outperforms (LS) and (TR). 

Finally, Figure 4 reports the simulation results of a Bermudan basket 
option with strangle spread payoff on the average of three correlated under- 
lyings. The option prices are normalized to start at 1. The strikes are set at 
0.85, 0.95, 1.05 and 1.15. This time (EKT) is based on degrees M E {0, 1,2}, 
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price LS 



price TR 



price EKT 



Fig. 1. Realized option prices of Bermudan put option. The boxes stretch from the 25th 
percentile to the 75th percentile, the median is shown as a line across the box. 



knot distance a E {1,1.5,2,4} and a reduced sample size of only n = 4000, 
split into n t = 800, n t = 2400 and n v = 800. (LS) and (TR) still use n = 10000 
but approximate the continuation value with polynomials of degree 2 (as 
polynomials of degree 3 resulted in lower MCE). Again, (EKT) provides the 
highest MCE of the option price. 




Fig. 2. Strangle spread payoff with strike prices 50, 90, 110 and 150. 
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Fig. 3. Realized option prices of Bermudan option with strangle spread-payoff. 
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Fig. 4. Realized option prices of Bermudan basket option with strangle spread-payoff 
based on the average of three correlated underlyings. 
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6. Proofs. In the proofs we will need an auxiliary result on the properties 
of the method of splitting the sample, which we formulate and prove for the 
sake of generality in a fixed design regression model. 

Let x\, . . . , x n G M rf and let Y\, . . . ,Y n be independent square integrable 
random variables which satisfy 

BYi = m(xi) (i = l,...,n) 

for some function m : M. d — ► R. Let V n be a finite set of parameters and 
assume that for each p £ V n an estimate m p : M. d — ► R is given. Choose p* G "P n 
by minimizing the empirical L2 risk on the sample (x\, Y±), . . . , (x n ,Y n ), that 
is, assume 

1 , , 1 71 



^T\mp*(xi) -Yi\ 2 = mm -^\m p {xi) -Y { \ 

'" i=l n i=l 

Then, the following bound on the error 

1 n 

-^2\m p *(xi) -m(xi)\ 2 



re . 
i=i 



of m p * holds. 



Lemma 6.1. Under the above assumptions, we have for each e > 

( I n 1 n ] 

P< — V \m p *(xi) - m(xi)\ 2 > e + 18 • min - V" |m p (xi) - m(rrj)| 2 

\ n tl P^nfl^ J 

< ci • max Elf • ^ 



i=i,...,n e • re 

/or some constant c\ which does not depend on re or e. 

Proof. Set 

1 n 

rre* = arg min — V — m(xj)| 2 . 

/e{m p :pep„} re ^ 

By Lemma 1 in [17] or standard results from the book [30] (see proof of 
Theorem 10.11 in [30]), we have 

P< - V \m p *(xi) - rre(xj)| 2 > e + 18 • min — V" \mJxi) - m(xi)\ 2 > 



{1 n 
-<-V|m p *(xi)-m*(x i )p 
2 re f— f 



MONTE CARLO ALGORITHMS FOR PRICING BERMUDAN OPTIONS 21 

16 n ) 

< — V(m p *(xj) -m*(xi)) ■ (Yi -m{xi)) \ 

{1 n 
- < - V \m p (xi) - m*(xi)\ 2 
2 n ~ 
%=i 

16 n 1 

< — V(m p (xi) - m*{xi)) ■ {Yi - m{xi)) \ 

i=i ) 

oo (- 1 n 

< |P n | • max V Pi 2 s ~ 1 e < - V |m p (x,) - m*(x;)| 2 < 2 s e, 

s=0 \ i=l 



1 " 

- V |m p (xj) - m*(xi)[ 
n f— f 



16 n 1 

< — V(m p (x;) - m*(xi)) ■ (Yi - m(xi)) \ 

oo f 1 n 

^l^nl-V] max P< - V(m p (xi) - m*(xi)) 

7=0 peVn [ n £=i 

( 1 /«)Er=i |m P (x I )-m*( :Cl )| 2 <2^ 



2 s e I 

{Y i -m{x i ))> — \. 



Because of the variance estimate 



V( - Y](m p (xi) -m*(xi)) ■ (Yi-m(xi))) 
1 n 

< Y7m p (xj) - m*(xi)) ■ max EY, , 

n z , i=l,...,n 
i=l 



we can bound the right-hand side from above with Chebyshev's inequality 
by 

, , (1/n) • 2 s • e • max i=1 „„, n EY, 2 |P ra | max, =1 „„, n E^ 2 ^ 32^ 
|/n| '^ (2V32) 2 n ' e ' ^ 2- □ 

Proof of Theorem 4.1. Because of 

E / \q n ,t{x) - q t (x)\ 2 v t (dx) < ]T E / |^ t (a:) - ft (x)| 2 ^ t (dx), 

J w=Q J 

it is enough to prove that 

(6.1) E / \q™ t {x) - q t {x)\ 2 iH{dx) - (n ^ oo) 
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for every t G {0, 1, . . . , T — 1} and every w £ {0,1, ... , u> max (i)}. 

Fix t G {0, 1, . . . , T — 1} and assume (by induction) that we have for every 
s G {t + 1, . . . , T - 1} and every v G {0, 1, . . . , u; max (s)} 

(6.2) E J |C» - ft(s)|Vt(<&0 _> (n- oo). 
Fix id G {0,1,... 

)' w max(^)}- ^ n the following we show 

(6.3) E J \ft >t (x) - qt(x)\ 2 fi t (dx) ^ (n ^ oo). 

To this end, we apply for a fixed p n G T 3 ™ the error decomposition 

OlVt(^) 



i=n;+l 
2 



/ . n ; +n t 

+ - E lW-ft(^ 

\ n * i=n;+l 

r, rii+nt \ 

-- E Kt^t)-^^)! 2 

Ut i=n l+ i ) 

- E i«^)-?r new (^, t )l 2 

"* i=n i+ l 

„ R n;+nt \ 

-- e icr(^)-^r ncw (^)i 2 

Ut i=n t +i / 



„„ ni+nt 

E icr(^)-?r new (^) 12 



i=n ; +l 



79 "i+nt \ 

- e icr(*M)-ft(*,*)i 2 



=n ; +l 

+ - E ICf"(^M)-ft(^t)l a 

1 i=m+i 

5 

= E^j>- 

i=i 

The proof will be completed once we have shown that 
(6.4) limsupET 3> <0 

n—*oo 
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for JG{1,2,...,5}. 

From now on we denote by T>^ t+1 the set of all the data used in the 
construction of the estimates q%f for s > t, w G {0, 1, . . . , w mayi (s)} and p G 
V n . 

Because q% t and qt are bounded in absolute value by L, we conclude from 
Hoeffding's inequality (see, e.g., Lemma A. 3 in [14]) that 

P{T ljn > e\Xl'™J mMt)+1 (i = 1, . . .,m),Vl t+l } 



< \V n \ ■ maxP / ~ qt(x)\ 2 Mdx) 



^ ni+nt 



- J2 \q%?(X i!t )- qt (X ijt )f>e 



Tit ■ 

1 i=m+i 



< |K| -exp(-^) =exp(log(|K|) - 
Thus, 

POO 

ETi, n < / P{Ti, n >s}a!s 
J o 

roc 

= J o B{P{r 1 ,„> a |AgSL (t)+1 (i = l,...,n,) > 23£ m }}(fa 

<4L 2 ^b«M+ r y exp(-^)d S 

v 74LViog(l^nl)/"t V 16L 4 / 



<4L 2 Jlog(\V n \)/n t 



+ / _^ exp — -1 s)ds 



4LVlog(|Pn|)/n t V 16-L 2 



<4L 2 ^Jlog(\V n \)/n t 



4:L 2 ( \ 

+ A nr>\\/ ' ex P - WhO (rwoc). 

n t ^\og{\Pn\)/n t \ J 

Furthermore, by a 2 = (a — b + 6) 2 < 2(a - b) 2 + 26 2 , we get 

cy ni+nt 

T 2 , n <- J2 \qr CW (X i ,t)-qt(X l , t )\ 2 , 
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from which we conclude, together with (3.7) and (6.2), that 

ET 2 , n = E{E{T 2 , n |4- W max(i)+1 (i = 1, . . ., ni ),Vl t+1 }} 

< 2E / | Q r* nCW (^) - qt{x)\ 2 ii t {dx) - (n-> oo). 
In a similar way we obtain 

ET 4 , n < 72E | \qf' new (x) - q t (x)\ 2 ^(dx) (n - oo). 
To bound T 3)n , we use Lemma 6.1, which shows 



P{T 3 , n > e|^; f ^ ax(t)+1 (i = 1, • ■ ■ ,ni),Z> njt+1 } 

<p - ^ ic, t (^)-«r ew (^) 12 

1 nt 



i=71; + l 

n ; +n t 



> 1 + 18- mm -L 2 lCf(^)-^' neW (^)| 2 



2 peP„ nt . 

J=n;+1 



Xi,new { ■ -\ \ -r->T 



. in 

< C2 



e • n t 

This implies for any u > that 



ET 3 , n < / P{T 3 , n >e}de 

JO 

/•oo 

< y E{P{T 3in > e |4^ W max{t)+1 (i = l,...,m),Vl t+1 }}de 

p const I •p^ I 

<u+ C2 ■ — — <ie 

i« e • n t 

Vp\ 

= u + C2 • — — • (log(consi) — logu). 
n t 

To get to the last line, we have used that (3.16) and the boundedness of 
qW,new (yjYijcfa i s a consequence of the boundedness of ft on [— A, A] d ) yield 

r> rn+nt 

T 3 , n <- \€,(Xi,t)-qT' ncw (X ht )\ 2 < const. 

Setting u= \ V n \/n t , we arrive at 

limsupET 3jn < 0. 

n— >oo 
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ET 5 , n = E{E{T 5in |X^ ax(i)+1 i = 1, . . . ,n,),^, t+1 }} 



Furthermore 
(6.5) 

= 72.Bj\C,r(x)-qt(x)\ 2 ^(dx). 
Consequently, it remains to verify that 

(6.6) E / \q™f n (x) - q t (x)\ 2 ^ t (dx) - (n- oo) 



for some suitably selected p n GV n . 

To prove (6.6), we set p n = (0, 2~r io S2(™)/(2+d)l ) ( wrie re log 2 is the loga- 
rithm for base 2) and consider the error decomposition 

\q™;t l (x)-qt(x)\ 2 LH(dx) 

r ? n ' 

= / ic;r(^)-ft(^)iV(^)--Ei<r(^)-^(^)i 2 



i=l 

+ - E - *(*,*)l a - - E I#T(*m) - %(^,*)i 2 



+ ^E 

V ' i=l 

4 ^ \ 

--Eicr(^)-?r ew (^)i 2 
n? t=i / 

+ 1 ElCf n (^*)-^' new (^)l 2 

9 

= E ^i>"' 

i=6 

Because qt is bounded in absolute value by L, we have 

T 7 , n <0 and ET 7 , n <0. 
In the same way as for ?2 jn , we obtain from (3.7) and (6.2) 



^n.t+l 



ET 8 , n < 4 • E j E j^E - <?r* ncw (^,tj 

= 4 ■ E / - q^{x)\ 2 ^ t (dx) - (?i - oo), 



it =supE{|r lt ' | \Xi ! t = x} <4L < oo, 
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where the last equality follows from the fact that the conditional expectation 
g™' new (i) does not depend on data from time t. 

Next, we estimate Tq jU . The functions q™f n and qt are bounded in abso- 
lute value by L, and q%f n belongs to the linear vector space H ntPn , whose 
dimension D n is bounded by some constant (depending on A) times n d ^ 2+d ^ . 
As in the proof of Theorem 11.3 in [14] [in particular, the proof of inequality 
(11.6)], we obtain 

F/F/T T u r2 (logn; + l)-n d /( 2+d ) 

ET 6 „ = E{E{T 6>n |£>^ m jj < c 3 L z > (n -> oo). 

ni 

It remains to bound Tg >n . With 

— 81™ 17 * 

we conclude from Theorem 11.1 in [14] 
E{T 9 , n |X M (i = i,...,m),vl t+1 } 

rAr) d/(2+d) i 

< 4ct 2 *U + 4 min ± £ | - <£» > new (JT,, t ) | 2 , 

which then leads to 

ET 9 , n = E{E{T 9 , n |X M (i = 1, . . .,m),vl t+1 }} 

d/(2+d) , 

< 4( j 2 ^ +4 min E / \h(x) - q?' new (x)\ 2 (it(dx) 

n\ hen n ,p n J 

d/(2+d) , 

<4a 2 ^ +8E / \ q r^{x)-q t {x)\ 2 ^{dx) 

Til J 



+ 8 min / \h(x) - q t (x)\ Ht(dx). 
Because of (3.7), (6.2) and 

\q t {x)\ 2 nt{dx) < L 2 < oo, 



which implies that qt can be approximated arbitrarily closely by functions 
from Tt n ,p n (this is a consequence of Theorem A.l in [14] and the fact that 
any continuous function can be approximated in the supremum norm on the 
compact set [— ^4, arbitrarily closely by the piecewise constant functions 
in Tt n ,p n as n — > oo), the right-hand side of the above inequality tends to 
zero for n — ► oo. The proof of Theorem 4.1 is complete. □ 

Proof of Theorem 4.4. The proof is similar to the proof of Theo- 
rem 4.1. The main difference is that we use Bernstein's inequality instead 
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of Hoeffding's inequality, which requires that we also control the variance. 
Because of 

«Wx(i) 

E / \q n , t (x) - q t {x)\ 2 ^ t {dx) < E E / Kt( x ) " Qt(x)\ 2 fi t (dx) , 
it suffices to show 

(6.7) E J \q™ t (x) - q t (x)\ 2 fH(dx) < const • C 2d ^ 2p+d ^ • f ^-j 

for every t G {0, 1, . . . , T — 1} and every w£{0,l,..., w max (i)}. 

Fix t G {0, 1, . . . , T — 1} and assume (by induction) that we have for every 
s G {t + 1, . . . , T - 1} and every u G {0, 1, . . . , u; max (s)} 



(6.8) E J \ql jS (x) - q s (x)\ 2 fi t {dx) < const ■ C 2d ^ 2p+d) ■ 
Fix w G {0, 1, . . . ,w max (t)}. We show 

(6.9) E J \q™ t (x) - q t (x)\ 2 fi t (dx) < const ■ C 2d/{2p+d) ■ f^jp 



2p/(2p+d) 



2p/(2p+d) 



To this end, we apply for fixed p n G V n the error decomposition 

\Qn,t(x) ~ qt(x)\ 2 nt(dx) 



/r, ni+m 
\ql t (x)-q t (x)\ 2 Mdx)-- E |g£ t (X M ) - <ft(* ' 
n t „■_„ , 1 



i=n;+l 



/ 2 n ;+™t 

+ - e i«r.*(^M)-«wi a 

\ n * i=n i+ l 

E \<t(Xi,t)-<l? Q ™{Xi,t)\ 2 ) 
Ut i= nl +l / 

(a ni+m 
- e iw-r"wi J 
n * i=n,+i 

-- e i<r(^)-9r ncw (^, f )i 2 



- E CW"^ 

Ht i=ni+l 
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144 ni+nt \ 

-— E \€:? n {x ht )- qt (x ht )\A 

nt i=n l+ l J 



144 



ni+nt 



+ — E l£fW 



V 2 

1 i=m+i 



5 

= E^j> - 

The proof is completed once we have shown that 

(o.io) E r,„ <„„.,<• c-/«^).(^) 2 " Pf+ '" 

for je {1,2,..., 5}. 

To apply Bernstein's inequality, we first bound the variance 

a 2 = V(|C,f (X ni+1>t ) - q t (X ni+1 , t )\ 2 \X^Mt) + i (* = L - • >^n,t +1 ) 

< E(|C;f (x„ I+M ) - «(**+m)| (i = 1, . . . .n,),©^) 

< 4L 2 E(|Cf (W " »(*H+M)lVg^ W+ i (* = 1, • • • M^lt+i) 
= AL 2 J \qZ*{x)-q t {x)\ 2 H {dx). 

Then, because q™t an d Qt are bounded in absolute value by L, we obtain 
from Bernstein's inequality (see, e.g., Lemma A. 2 in [14]) 

P{Ti,„ > e|^ n ^ iax(t)+1 (» = !,.. • ,n ? ),P^ m } 



< |-P„| • maxP / (x) - 



2 ni+n t 



f x: icf(^)-^(^,t)i 2 >e 



iKl-maxP^ / \q™f(x)-q t {x)\ 2 m(dx) 

1 ni+nt 



n * 2=n i+ l 
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> I + 2 / 1^.*'^ ~ Qt(x)\ 2 Vt(dx) 



X i,'^max(*) + l (* - X ' • ■ • ' ^ V n,t+1 



< \V n \ • maxP / \Cf{x) ~ q t (x)\ 2 Ht(dx) 



- x: i<f(*M)-%(x 



e 1 a 2 



-)\ 2 



> + 



2 2 4L 2 



< l^nl • exp 
<\V n \- exp 



n t (e/2 + a 2 /(8L 2 )) 2 



2a 2 + 2(e/2 + cj 2 /(8L 2 )) • (4L 2 /3) 

n f (e/2 + q 2 /(8L 2 )) 2 
' (16L 2 + 8L 2 /3)(e/2 + <r 2 /(8L 2 )) 



1 n t e\ ( 3 n t e 

- • exp ' -32TT67^ • I 2 ") = lVnl • exp l" m ' T 2 " 



Thus, 



(•OO 

ET lin < / P{T lin >s}ds 

JO 



oo 



E{P{T liB > a|*J£L (4)+1 (< = !,.. • ,n,)Xm}}* 



/•oo 

< IKI • / 

«/ 



3 nt \ n J 

Furthermore, by a 2 = (a — b + 6) 2 < 2(a — b) 2 + 26 2 , we get 



4 



ni+nt 



T 2 , n <- \q^ nCW (X ht )-q t (X t . 



from which we conclude, together with (3.7) and (6.8), that 

rt,new 

L i,t:l0max(t) + 1 

<4E| \q^ (x) - q t {x)\ 2 ^ t {dx) 



ET 2 , n = E{E{T 2 , n \Xl?™ )+1 (i = 1, .. .,n,),2£, m }} 
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< const 
Similarly, we get 



c 2d/ { 2 P+d) . ^ogn^ 2p/i2p+d \ 



,'lno-n \ 2p/( 2 P+ rf ) 

ET 4 n < const .C 2d '^-' 



re 

To bound T 3n , we apply Lemma 6.1, which shows 

P{T 3 , n > e|^™ ax(t)+1 (i = I, - ■ • ,ni),z£ jt+1 } 



i=n ; +l 



i i o • \ i^ui,p/x^ \ to, new/ v 

> - + 18- mm— ^ 9 nt( x i,t) ~<lt K x i,t 
i=m+i 



X i,t : «w(t)+1 (* = l.---.ni)»2 ? n,M-l 



. in 
< c 5 



e • re t 

This implies for any u > 



/■CO 

ET 3 , n < / P{T 3>n >e}de 
Jo 

POO 

< j Q E{P{T 3 , n > e |4^ W max{t)+1 (i = l,..., ni ),Vl t+1 }}de 

/•const \*p I 

<u+ C5 ■ — — tie 

iu e • nt 

mi 

= u + C5 • — — • (log( const) — logu), 
n t 

where we have used that (3.16) and the boundedness of q™' new (which is a 
consequence of the boundedness of ft on [—^4,74]^) yield 

4 ni+nt 

r 3j n < — V \q% >t (X i: t) - qT ,neW (Xi,t)\ 2 < const. 

1 i=ni+l 

With u = log (re) /re, we get 

ET 3in <^(l + C6 (log ( c n^-log(^))) 

< cons* • C M /( 2p+d ) • 



W re \ V(2p+d) 



^logre^ 



MONTE CARLO ALGORITHMS FOR PRICING BERMUDAN OPTIONS 31 



ET 5 , n = E{E{T 5 , n |Xg^ (t)+1 (i = 1, . . . , ni ),Vl t+1 }} 



Furthermore 
(6.11) 

= U4-Ej \C! n {x)-qt{x)\ 2 lit{dx). 
Consequently, it remains to verify that 

(6.12) E / \Cf n {x) ~ q t (x)\ 2 fi t (dx) < const ■ <^/(2 P +d) ■ ( 

for some suitably selected p n £V n . 

To bound ET 5 n , we use the error decomposition 

\Cf n {x)-q t {x)\ 2 ^t{dx) 

r 2 ni 

= / KHz) - <z*(*)lV(^) - - £ KH^t) - <a(* 



t)\ 2 



+ - E lCf n (^t) - Qt(x t , t )\ 2 - - e icr - N 



n; r-f n; . 



It j 



+ - E - ^(xu)\ 2 - - e icr(^) - ?r new (^ ni 
n? i=i n? t=i 

+^EiCf n (^)-?r new (^)i 2 
n/ i=i 

9 

= E 

j=6 

with 

p n = (fc,2 ; ) where I = \log 2 (C- 2 ^ 2p+d) (n/ log(n)y l/{2p+d) )] . 
Because is bounded in absolute value by L, we have 

T 7j n < and ET 7>n < 0. 
Furthermore, in the same way as for Ti,ni we obtain from (3.7) and (61 
( ( y ni 



ET 8 , n < 4E E - E h(X i>t ) - qr™{X. 
I I 711 t=i 

= 4E||g t (x)-gr' neW (^)|V(^) 
< CO n^.C^).(^) 2P/(2P+d) 



Ul 2 



u n,t+l 
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where the last equality follows from the fact that the conditional expectation 
q]"' new {x) does not depend on data from time t. 

Next, we bound Tg >n . The functions q™f n an d qt are bounded in abso- 
lute value by L, and q%f n belongs to the linear vector space "H n ,p n , whose 
dimension D n is bounded by some constant (depending on A and k) times 
C 2d/(2 P +d) . ( n /log(n)) d /( 2 P+ d ). As in the proof of Theorem 11.3 in [14] [in 
particular, the proof of inequality (11.6)], this implies 
ET 6jra = E{E{T 6jn |£>^ m }} 

r2 (logn, + 1) • C72d/(2 P +d) . (n/\ og (n))d/{2p+d) 



< const ■ C 2d ^ ■ 



ni 

2p/(2p+d) 



n 

Finally, we bound Tg jn . With 

a 2 = sup E*{|Y™' new | 2 |*i,t = x}< \L 2 < oo, 

x£R d 

we can conclude from Theorem 11.1 in [14] that 
E{T 9)n |X M {i = i,..., ni ),vl t+1 } 

n i n ' 

< 4a 2 • ^ + 4 min - £ \h(X ht ) - q?>™ (X i>t )\ 2 

ni heHn, Pn ni ~{ 

< Ao- 2 • n2d/(2p+d) 

n 2p/(2p+d) . \ og ( n }d/(2p+d) 

+ 4 h mm -£lM*M)-ft W,W (*M) 12 
n€nn,p n Til a— i 



Therefore, 



ET 9 , n = E{E{T 9jn |X M (i = 1, . . .,m),Vl t+1 }} 

- Y s2p/{2p+d) 



< 12a 2 ■ C 2d '^ 



+ 4 min E / \h(x) — q]"' ncw (x)\ 2 /J,t(da 

h(^H n ,p n J 

<ua 2. c ^ p+ v.(^ 2p/(2p+d) 



n 



+ 8E| \q^ new (x)-q t (x)\ 2 Mdx) 
+ 8 min / \h(x) — q t (x)\ 2 fi t (dx) . 
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Note that for the last term in the last inequality (without the factor 8) we 
get 

min / \h(x) — qt(x)\ 2 Ht(dx) < min sup \h(x) — qt(x)\ 2 . 
h£H n ,pJ hfzH n ,p xG[—A,A]' i 

Because we have assumed that qt is (p, C)-smooth, there exist a h £ TC n ,p 
with 

sup \h(x) - q t (x)\ < c 9 ■ C ■ 5%, 
xe[~A,A] d 

where S n = C~ 2 ^ 2p+d ' ) • (n/log(n))~ 1 /( 2p+d ) is the edge length in the cubic 
partition used in the definition of the spline space; see Theorem 12.8 in [24]. 
We conclude that 

min / \h(x) - q t (x)\ 2 fi t (dx) 

<4-c 2 -s%> 

= c 2 -C 2 - C-^ 2p+ ^ ■ {n/\og{n)r 2vl{2p+d) 

2p/(2p+d) 

n J 

From (3.7), (6.8) and the above inequality we see that 

■\ogn\ 2p ^ 2p+d ^ 



< const- C 2d ^.( l -^f 



ET 9 , n < const ■ C 2d/(2pH 



n 



has an upper bound with the proper rate. The proof of Theorem 4.4 is 
complete. □ 
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